Automatic acquisition of lexical semantic information using medium to small corpora

نویسندگان

  • Mathias Rossignol
  • Pascale Sébillot
چکیده

Since many speech and text processing techniques are portable with a limited amount of work from one language to another, the most daunting task for NLP and SP practitioners becomes to build the resources needing for those tools to operate, In particular, the constitution of “high-level” resources, such as advanced corpus annotations or linguistically motivated lexicons, can be extremely work-intensive. We present in this paper a system to assist the creation of semantic lexicons using small to medium-sized corpora, thanks to the combination of semantic class constitution and topic detection, and the development of specific statistical data analysis techniques for relatively small datasets. By reducing the amount of data needed for semi-automatic semantic lexicon acquisition, traditionally applied to 100 million-word corpus or more, we make this help for lexical resource acquisition applicable to the case of underresourced languages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Construction of Persian ICT WordNet using Princeton WordNet

WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...

متن کامل

Second Language Information Transfer in Automatic Verb Classification – A

Second Language Information Transfer in Automatic Verb Classification – A Preliminary Investigation Vivian Tsang Master of Science Graduate Department of Computer Science University of Toronto 2001 Lexical semantic classes incorporate both syntactic and semantic information about verbs. Lexical semantic classification of verbs provide a great deal of useful information about the possible usage ...

متن کامل

Towards Semi Automatic Construction of a Lexical Ontology for Persian

Lexical ontologies and semantic lexicons are important resources in natural language processing. They are used in various tasks and applications, especially where semantic processing is evolved such as question answering, machine translation, text understanding, information retrieval and extraction, content management, text summarization, knowledge acquisition and semantic search engines. Altho...

متن کامل

An Application of Lexical Semantics to Knowledge Acquisition from Corpora

In this paper, we describe a program of research designed to explore.' how a lexical semantic theory may be exploited for extracting information from corpora suitable for use in Information Retrieval applications. Unlike with purely statistical collocational analyses, the framework of a semantic theory allows the ~ultomatic construction of predictions about semantic relationships among words ap...

متن کامل

Enriching a lexical semantic net with selectional preferences by means of statistical corpus analysis

Broad-coverage ontologies which represent lexical semantic knowledge are being built for more and more natural languages. Such resources provide very useful information for word sense disambiguation, which is crucial for a variety of NLP tasks (e.g. semantic annotation of corpora, information retrieval, or semantic inferencing). Since the manual encoding of such ontologies is very labour-intens...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008